6,908 research outputs found

    Learning First-Order Definitions of Functions

    Full text link
    First-order learning involves finding a clause-form definition of a relation from examples of the relation and relevant background information. In this paper, a particular first-order learning system is modified to customize it for finding definitions of functional relations. This restriction leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy. Other first-order learning systems might benefit from similar specialization.Comment: See http://www.jair.org/ for any accompanying file

    Unitary groups over local rings

    Full text link
    Structural properties of unitary groups over local, not necessarily commutative, rings are developed, with applications to the computation of the orders of these groups (when finite) and to the degrees of the irreducible constituents of the Weil representation of a unitary group associated to a ramified extension of finite local rings

    Improved Use of Continuous Attributes in C4.5

    Full text link
    A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes. An MDL-inspired penalty is applied to such tests, eliminating some of them from consideration and altering the relative desirability of all tests. Empirical trials show that the modifications lead to smaller decision trees with higher predictive accuracies. Results also confirm that a new version of C4.5 incorporating these changes is superior to recent approaches that use global discretization and that construct small trees with multi-interval splits.Comment: See http://www.jair.org/ for any accompanying file

    Learning a Static Analyzer from Data

    Full text link
    To be practically useful, modern static analyzers must precisely model the effect of both, statements in the programming language as well as frameworks used by the program under analysis. While important, manually addressing these challenges is difficult for at least two reasons: (i) the effects on the overall analysis can be non-trivial, and (ii) as the size and complexity of modern libraries increase, so is the number of cases the analysis must handle. In this paper we present a new, automated approach for creating static analyzers: instead of manually providing the various inference rules of the analyzer, the key idea is to learn these rules from a dataset of programs. Our method consists of two ingredients: (i) a synthesis algorithm capable of learning a candidate analyzer from a given dataset, and (ii) a counter-example guided learning procedure which generates new programs beyond those in the initial dataset, critical for discovering corner cases and ensuring the learned analysis generalizes to unseen programs. We implemented and instantiated our approach to the task of learning JavaScript static analysis rules for a subset of points-to analysis and for allocation sites analysis. These are challenging yet important problems that have received significant research attention. We show that our approach is effective: our system automatically discovered practical and useful inference rules for many cases that are tricky to manually identify and are missed by state-of-the-art, manually tuned analyzers

    How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

    Full text link
    Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page

    Hydrology and Water Quality in the Central Kentucky Karst: Phase II Part A: Preliminary Summary of the Hydrogeology of the Mill Hole Sub-Basin of the Turnhole Spring Groundwater Basin

    Get PDF
    Water from upland areas flows to small ephemeral and perennial springs that feed sinking streams that are tributary to low-order cave streams. These cave streams, also recharged by diffuse percolation, are part of a dendritic network in which intermediate-order streams join high-order streams that flow to major trunk streams. The trunk in the Mill Hole Sub-basin flows across the bottom of a large karst window, Mill Hole, and joins the trunk of the Patoka Creek Sub-basin. Their combined discharge bifurcates, flows around the collapsed central core of a larger karst window, Cedar Sink, and re-joins to flow as one to Turnhole Spring, along the south bank of Green River. The location of the major trunk streams can be inferred from the position and orientation of well-defined troughs in the piezometric surface. Flow velocities over the same 5-mile distance, erroneously assuming a straight path from Parker Cave to Mill Hole, range from 60 to 1100 ft per hour--depending on whether discharge is at flood or base flow conditions. Actual velocity extremes are probably lower and higher
    corecore